Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔀 SIMD Programming
Specific
Vectorization, Parallel Computing, CPU Instructions, Performance
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
121454
posts in
27.0
ms
APL
Performance
📏
Linear Types
aplwiki.com
·
3d
·
Hacker News
·
…
AXON: An Automated
Netlist
Optimization Framework for High-Speed
Adders
🚀
Superoptimization
arxiv.org
·
2d
·
…
facebookincubator/dispenso
: The project provides high-performance concurrency, enabling highly parallel computation.
🌀
Naiad
github.com
·
21h
·
Hacker News
·
…
Building a Free AI Image Generator on 7
GPUs
: Architecture Deep
Dive
🎮
WebGPU
dev.to
·
12h
·
DEV
·
…
Metal Quantized Attention: pulling M5 Max ahead with
Int8
matrix
multiplication
⚡
Hardware Acceleration
releases.drawthings.ai
·
1d
·
Hacker News
·
…
Accelerate CPU-based AI inference workloads using Intel
AMX
on Amazon
EC2
🔢
Intel AMX
aws.amazon.com
·
3d
·
…
Building
CompilerSutra
🔨
Compiler Design
docs.google.com
·
21h
·
DEV
·
…
Intel
Binary
Optimization Tool Changes Code Execution with Heavy
Vectorization
📊
Profiling Tools
techpowerup.com
·
2d
·
…
'Performance without compromise': AMD debuts first dual 3D V-Cache Ryzen CPU in potential showdown against
Threadripper
and
EPYC
siblings
⚡
Hardware Acceleration
techradar.com
·
2d
·
…
Iteratively
optimizing an
SPSC
queue
⭕
Ring Buffers
blog.c21-mac.com
·
4d
·
r/cpp
·
…
Supercharging
Redpanda
Streaming with profile-guided optimization
🚀
Performance
redpanda.com
·
1d
·
…
MXFP8
GEMM: Up to 99% of
cuBLAS
Performance Using CUDA and PTX
🧩
mimalloc
danielvegamyhre.github.io
·
5d
·
Hacker News
·
…
Building a
Production-Grade
Vector Database in Rust: What We
Shipped
🚀
Shuttle
ferres.io
·
2d
·
DEV
·
…
Intel Delivers Open, Scalable AI Performance in
MLPerf
Inference
v6.0
🎯
Intel IPP
newsroom.intel.com
·
1d
·
…
Why I’m Building a
Database
Engine in C#
🔨
Incremental Compilation
nockawa.github.io
·
6d
·
Hacker News
·
…
Adaptive Parallel
Monte
Carlo
Tree Search for Efficient Test-time Compute Scaling
⚡
X-Fast Tries
arxiv.org
·
21h
·
…
abdimoallim/psimd
: A portable, header-only SIMD library for C (SSE2, SSE4.1, AVX/AVX2+FMA, NEON/AArch64, WebAssembly
SIMD128
, scalar fallback)
🔢
AVX-512
github.com
·
1d
·
r/C_Programming
·
…
Performance &
Recursion
🌳
Instruction Selection
dev.to
·
4d
·
DEV
·
…
m0at/rvllm
:
rvLLM
: High-performance LLM inference in Rust. Drop-in vLLM replacement.
📊
Criterion.rs
github.com
·
5d
·
Hacker News
·
…
yash27-lab/batch
_forge: A high-performance, bare-metal inference engine for JAX and Equinox models written in Rust. Features zero-copy
Safetensors
loading and hand-optimized Metal/Vulkan compute kernels for Transformers, Vision Language Models, and State-Space Models
🏛️
Embassy
github.com
·
4d
·
Hacker News
·
…
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help